Health informatics sits at the vibrant intersection of medicine, data science, and technology, transforming how we store, analyze, and utilize health information. This rapidly evolving field empowers clinicians and researchers to uncover patterns in patient data, improve diagnostic accuracy, and personalize treatment plans without getting lost in complex databases. By turning raw medical records into actionable insights, these innovations are reshaping the future of healthcare delivery and population health management.

At Gist.Science, we bridge the gap between cutting-edge research and public understanding by curating the latest preprints from medRxiv specifically within this domain. Our team processes every new submission in this category, providing both accessible plain-language explanations and detailed technical summaries to ensure the science is clear for everyone, from policymakers to curious readers. Below are the latest papers in health informatics, freshly distilled and ready for you to explore.

Unmeasured but Not Unbiased: The Missingness Demographic Leakage Audit (MDLA) for Calibration-Aware Fairness Evaluation in Critical Care Mortality Prediction

This paper introduces the Missingness Demographic Leakage Audit (MDLA), a reproducible framework that reveals how patterns of missing clinical data in critical care mortality models can act as subtle, unmeasured demographic proxies, necessitating the integration of missingness-aware auditing and calibration-aware evaluation into clinical AI validation pipelines.

Patel, K., Beedala, P.2026-05-03📄 health informatics

Disease Risk Prediction Using Structured EHR Data: Can Generalist Large Language Models Match Specialized Clinical Foundation Models? A Comparative Evaluation with Fine-Tuning

This comparative evaluation demonstrates that while fine-tuned generalist large language models generally underperform specialized clinical foundation models on structured EHR disease risk prediction, LLM-generated embeddings paired with lightweight classifiers can achieve superior performance across both AUROC and AUPRC metrics.

Mao, B., Prasadha, M. K., Xie, Z., He, J., Ghebranious, M., Xu, H., Zhi, D., Rasmy, L.2026-05-01📄 health informatics

Protocol for the REVELIO test-track pilot study: a randomised, controlled, single-centre trial in healthy recreational cannabis users investigating real-time in-vehicle detection of cannabis-impaired driving

The REVELIO protocol outlines a randomized, controlled pilot study on a closed test track designed to evaluate the feasibility of a multimodal in-vehicle system for detecting cannabis-impaired driving in healthy recreational users by correlating vehicle, driver, and biological data following controlled THC administration.

Bechny, M., Deuber, R., Heck, C., Brügger, J., Pfäffli, M., Jovanova, M., Fleisch, E., Wortmann, F., Weinmann, W.2026-05-01📄 health informatics

AERO: An AI Agent for Adaptive Eligibility Refinement and Optimization of Clinical Trial Criteria in Real-World Trial Emulation

The paper introduces AERO, an AI agent framework that optimizes clinical trial eligibility criteria for real-world data emulation by leveraging large language models to systematically classify and refine criteria, thereby improving the generalizability and accuracy of treatment effect estimates as demonstrated in a WARCEF trial emulation.

Li, X., James, J., Pellikka, P. A., Zong, N.2026-05-01📄 health informatics

Integrating Group and Individual Fairness Auditing in Clinical AI: A Post-Hoc, Model-Agnostic Approach

This paper introduces EquiLense, a practical, post-hoc, and model-agnostic auditing tool that bridges the gap between group and individual fairness assessments in clinical AI by utilizing a novel metric called Mean Predicted Probability Difference (MPPD) to identify systematic prediction inconsistencies across demographic groups.

Xu, J., Hwang, Y. M., Kondareddy, S., Dormoy, I., Jing, S. L., Pillai, M., Curtin, C. M., Hernandez-Boussard, T.2026-04-30📄 health informatics

MIMIC-IV-Phenotype-Atlas (MIPA) : A Publicly Available Dataset for EHR Phenotyping

The paper introduces MIMIC-IV-Phenotype-Atlas (MIPA), the first publicly available benchmark dataset featuring expert-annotated discharge summaries across 16 phenotypes, which enables standardized evaluation of phenotyping methods and demonstrates that large language models outperform traditional rule-based and machine learning approaches in identifying complex medical conditions.

Yamga, E., Goudrar, R., Despres, P.2026-04-24📄 health informatics

Stakeholder perspectives on the use of enhanced mobile phone capabilities for public health surveillance for non-communicable disease risk factors: A qualitative study

This qualitative study of stakeholders in Uganda highlights that while mobile phone-based tools offer significant potential for improving non-communicable disease surveillance in low-resource settings, their successful and responsible implementation depends on proactively addressing critical ethical, legal, and social challenges related to privacy, equity, and data governance.

Mwaka, E. S., Nabukenya, S., Kasiita, V., Bagenda, G., Rutebemberwa, E., Ali, J., Gibson, D.2026-04-23📄 health informatics

Decision Curve Analysis for Evaluating Machine Learning Models for Next-Day Transfer Out of ICU

This study demonstrates that Decision Curve Analysis provides a superior framework for evaluating machine learning models predicting next-day ICU transfers by quantifying clinical decision utility and optimizing operational thresholds to align with real-world workflow constraints, outperforming traditional discrimination metrics and simple clinical rules.

Pozo, M., Pape, A., Locke, B., Pettine, W. W.2026-04-21📄 health informatics